Assessment and Development of POS Tag Set for Telugu

نویسندگان

  • R. J. Rama Sree
  • Uma Maheswara Rao G.
  • K. V. Madhu Murthy
چکیده

In this paper, we first had a overall study of existing POS tag sets for European and Indian languages. Till now, most of the research done on POS tagging is for English. We observed that even though the research on POS tagging for English is done exhaustively, part-of-speech annotation in various research applications is incomparable which is variously due to the variations in tag set definitions. We understand that the morphosyntactic features of the language and the degree of desire to represent the granularity of these morpho-syntactic features, domain etc., decide the tags in the tag set. We then examined how POS tagset design has to be handled for Indian languages, taking Telugu language into consideration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Pos Taggers for Improved Accuracy to Create Telugu Annotated Texts for Information Retrieval

POS Tagging is the process of assigning a correct POS tag (can be a noun, verb, adjective, adverb, or other lexical category marker) to each word of the sentence. POS taggers are developed by modeling the morpho-syntactic structure of natural language text. We attempted to improve the accuracy of existing Telugu POS taggers by using an voting algorithm. The three Telugu Pos taggers viz., (1) Ru...

متن کامل

Part-of-Speech Tagging and Chunking with Maximum Entropy Model

This paper describes our work on Part-ofspeech tagging (POS) and chunking for Indian Languages, for the SPSAL shared task contest. We use a Maximum Entropy (ME) based statistical model. The tagger makes use of morphological and contextual information of words. Since only a small labeled training set is provided (approximately 21,000 words for all three languages), a ME based approach does not y...

متن کامل

Morphology Based POS Tagging on Telugu

In this paper, we present a morphological based automatic tagging for Telugu without requiring any machine learning algorithm or training data. We believe that inflectional and agglutinating languages, the critical information required for tagging comes more from word internal structure than from the context and we show how a well designed morphological analyzer can assign correct tags and disa...

متن کامل

معرفی رویکردی ماشینی با استفاده از الگوریتم لسک و برچسبدهی نحوی جهت رفع ابهام از معنای کلمات

The present study introduces a machine-based approach for word sense disambiguation (WSD). In Persian, a morphologically complex language, POS tag which lots of homographs are made, one way for doing WSD is allocating the right Part Of Speech (POS) tags to words prior to WSD. Since the frequency of noun and adjective homographs in different Persian POS tag text corpuses is high, POS tag disambi...

متن کامل

A Dependency Treebank for Telugu

In this paper, we describe the annotation and development of Telugu treebank following the Universal Dependencies framework. We manually annotated 1328 sentences from a Telugu grammar textbook and the treebank is freely available from Universal Dependencies version 2.1.1 In this paper, we discuss some language specific annotation issues and decisions; and report preliminary experiments with POS...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008